Generating Synthetic Data for Text Recognition

نویسندگان

  • Praveen Krishnan
  • C. V. Jawahar
چکیده

Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner. In this work, we exploit such a framework for data generation in handwritten domain. We render synthetic data using open source fonts and incorporate data augmentation schemes. As part of this work, we release 9M synthetic handwritten word image corpus which could be useful for training deep network architectures and advancing the performance in handwritten word spotting and recognition tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generation of Synthetic Training Data for an HMM-based Handwriting Recognition System

A perturbation model for generating synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thicken...

متن کامل

When will synthetic speech sound human: role of rules and data

Text-to-speech synthesis research has moved away from building general purpose systems based on an understanding of human language and speech production towards building systems based on statistical algorithms applied to large text and speech corpora, and, recently, towards building such systems for specific domains. Despite substantial progress, the overall quality of even the best systems is ...

متن کامل

TO APPEAR : SDAIR 1995 Generating Synthetic

In this paper we describe work on a system for modeling errors in the output of OCR systems. The project is motivated by the desire to evaluate the performance of various text analysis systems under varying, yet controlled conditions. We describe a set of symbol and page models which are used to degrade an ideal text by introducing errors which typically occur during scanning, decomposition and...

متن کامل

Realistic Speech Animation of Synthetic Faces

In this study, we combined physically-based modeling and parameterization to generate realistic speech animation on synthetic faces. We used physically-based modeling for muscles. Muscles are modeled as forces deforming the mesh of polygons. Parameterization technique is used for generating mouth shapes for speech animation. Each meaningful part of a text,which is a letter in our case,correspon...

متن کامل

Off-line cursive handwriting recognition using synthetic training data

The objective of this thesis is to investigate the generation and use of synthetic training data for off-line cursive handwriting recognition. It has been shown in many works before that the size and quality of the training data has a great impact on the performance of handwriting recognition systems. A general observation is that the more texts are used for training, the better recognition per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1608.04224  شماره 

صفحات  -

تاریخ انتشار 2016